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ABSTRACf 

Genetic changes underlie tumor progression and may lead to cancer- 
' specific cxpri^on of critical genes. Over llOQ publications have de-.. 
; scr&ed the use of comparative genomic hybridization (CGH) to analyze 
the pattern of copy number alterations in cancer, but very few of the genes 
affected are known. Here, we performed high-resolution CGH analysis on 
cDNA microarrays in breast cancer and directiy compared copy number 
and mRNA expression levels of 13,824.gen^ to quantitate tbe.lmpact.of ' 
genomic changes on gene expression. We identilied and mapped t|ie 
boundaries of 24 indq>cndent ampUconsy ranging In size from 0^ to 12 
Mb, Throngbout ^e genome, both Ug^- and low-leVel copy nnmbfer 
c.h«iges had a substantial impact on gene expression^ With '44% of the 
biglily amplified genes showing overexpression andrl*05% of the highly 
pvcreipressed genes being amplified. Statistical analysb^ with random 
pemntation tests identified 270 genes whose expression levels across 14 
samples were systematically attributable to gene, amplificatio'tt. T^ese 
included most previously described amplified genes in breast cancer and 
fnaoy novel targets for genomic alterations, including the HOXB7 gene, 
the presence of which In a novel amplicon at ]7q2U was validated in 
10J% of primary breast cancers and associated witK poor patient prog- 
nosis. In condttsion, CGH on cPNA microarrays revealed hundreds of. 
novel genes whose overexpression is sttributable to gene ampllfica^n. 
These genes may provide insights to tlie clonaV evolution and progression 
jof breast cancer and highlight promisbig therapeutic targets. 

INTRODUCTION 

Gene expression patterns revealed by cDNA microarrays have 
facilitated classification of cancers into biologically distinct catego- 
riessi some of which may explain.the clinical behavior of the tumors 
(1-6). Despite this progress in diagnostic classification, the molecular 
mechanians underlying gene e?q)ression patterns in cancer have re- 
mained elusive, and the utility of gene ejq)rwsion profiling in the 
identffioation of specific therapeu^Q tar.gets remaiiis KimteS?*^ ^ 

Accumulation of genetic defects is thought to underlie fiie clonal 
evoMcMi of cancer. Identification of the genes that mediate the effects 
of genetic changes may be important by highlighting transcripts that 
are actively involved in tumor progression. Such transcripts and their 
encoded proteins would be ideal targets , for anticancer therapies, as 
demonstrated by the clinical success of new therapies against an^li- 
fied cmco^oies, such as^i2B^2 zcA EGFRp^ %\ in breast cancer and 
other solid tumors. Besides an^lifications of known oncogenes, oyer 
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Fig. 1. Impact of gene copy number on global gene cj^ies^ 
over- and underexpressed^genea {Y axis) according to coffy .munberiaBaos (T axis). 
Threshold vahies used for over- and underaqjiessfon were >Z]84 (global upper 7% of 
the cDNA ratios) and <0.4826 (global lower 7% of the expression ratios). B. percentage 
of amplified and deleted genes according to expression ratios. ITireshold vahies tor 
' anqiUiication and ddetioa were >U and <0.7« * 



20. recurrent regions of DNA amplification have been m^ped in 
breast cancer by CGH^ (9, 10).. However, Aese an[q[>licons are often 
latge and poorly defined, and their impact on gene raqnession remains 
nnknowa 

We hypothesized that genpme-vwde identification of those gene 
expression changes that are attributable to underlying g«ie copy 
number alterations would highlight transcripts that are. actively In- 
volved in the causation or maintenance of the malignant phenotype. 
To identify such transcripts, we applied a combination of cDNA and 
CGH microarrays to: (a) determine the global impact that gene copy 
number variation plays in breast cancer development and progression; 
and (b) id^tify and characterize those genes whose mRNA e3q>iies- 



- ^ The abbreviations used are: CGH, comparative genomic hybridizatfon; FISH, flno- 
lescence in situ hybridization; RT-PCR, reverse transcriptioq-PCR. 
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^^fiiL^S^"*^ ^ MCP-7 bvast cancer ceil Une. ^ dmnnosoaMd MH analysts of MO?-?. The eopy number ntio praBle (Mie 
across the entire ffenome fircrni In telAm«« tn Yn t^Mn^ t. .h/m^. «^ *i er» / tj a ti.^i. ; ^ - • kj_*^*«»» j«vmic ^orue 



toe) across the cntiie goome fiimi Ip telomere to Xq telomere is shown aloiig wMi ±1 SD (onn^e tinesy The Wodfc Aorizonto/ indicates • a ntid if 1 raf ibie; ft'i^~«f d J- 
Sif i^* 8»nonM^wide «py number analysis in MCF-7 by CGH cDNA microaiiay. TTie copy ^umb^ ratios were plotted as a fim(^ of fiie pb^^ 

of the «^ Ac human genome. In i?, mdividual data points are connected i^th a line, and a moving dedian of 10 Migaoent clones is shown. W horizontal Hfl tiie 

tfie next 5H rfttie «^on rato in MCF-? ccUi (overe)q>^d genes); ^h(gA/ greoi dais indicate the lomt 2%, and ifarJt grew dots, the nort ^Tof the amresdon iZs 



sibn is most signi&anfly associated with amplificatidn of the corre- 
sponding genomic'^ii]|>late. 

MATERIALS ASED Mj^THQPS 



Breast Cancer €^ Lines. Fourteen breast cancer ccU lines (BT-20, BT- 
474, HCC1428, Hs57«t, MCF7, MI)A-361, MDA-43.6, MDA-453, MDA-468, 
SKBR.3, T-47D, UACC812, ZR-75-.I, and ZR-75-30) wetfe obtained ftom ttie 
American Type Culture CoUcction (Manassas; VA). Cells were grown under 
recommended culture conditions. Oeaomic DNA and mRNA were isohtfed 
using standard protocols. 

Copy Number and KKpresslon Analyses by cDNA Microarrays. The 
piqwration and printing of die 13,824 cDNA clones on glass slides woe 
perfornwd as descn^ed (11-13). Of Aese clones* !244 represented unc 

terized expressed sequence tags, the remainder corresponded to known 't.^*^^** ^*)- ^? calcukted a weigjit, w^, for each gene asL^llowB: 
genes. CGH experiments on cDNA microarrays were done as described (14, 



wm exchided the analysis and were tr«Eited. as missing values. The 
distributions of fluorescence ratios were n8ed.to defme cu^ints for increased/ 
decreased copy number. Genes with COH ratio >1.43 (r^>resenting die iqjper 
;5% of the CGH ratios across all experunents) were considered to be amplified, 
and genes with ratio <p:73 (rqwesentiiig tiic Ibwdr 5%) wore considmd to be 
deleted. 

Statistical Analysis of CGH and cDNA Microarray Data* To evaluate 
the influence of copy number alterations on geac expression, we. applied the 
following statistical approach. CGH and cDNA calibrated intensity ratios were 
log-transformed and normalized Using median centering of die values m eadi 
cell line. Purdiermore, cDNA ratios for each gene across aO 14 cell lines were 
median centered. For eadi gene, die CGH data were represented hy a vector 
that was labeled 1 for amplification (ratio, >K43) and 0 foe no ampliflcadon. 
Ampliflcation was correlated with gene esi^tession nsug the sigrat-to-iiolse 



15), Briefly, 20 /jtg of genotnic DNA from breast cancer cell lines and normal 
human WBQs were digested for 14-18 h with i4M and Usal (Life Technol- 
ogies, hic., Rockville, MD) and purified by phenol/chl<Mwform extraction. Six 
Mg of digested cell line DNAs werei labeled with Cy3-dUTP (Amersham 
Phamaada) and nortnal DNA with Cy5*dUTP (Amersham Pharmacia) using 
the Bioprime Labeling kit (Life Technologies, Inc.). Hybridization (14, 15) and 
posthybtidization washes (13) were doAe as. described. For tfie expression 
analyses, a standard reference (Universal Human Reference RNA; Stratagene. 
La JoDa," CA) was used in all experiments. Forty /tg of refetence RNA were 
labeled witfi Cy3-dinT and 3.5 >ig of test ihRNA widi Cy5-dUTP, and the 
labeled cDNAs were hybridized on microarrays as described (13, 15). For both 
microarray analyses, a laser confocal scanner (Agilent TechnoIogicSi Palo 
Alto, CA) was used to measure the fluorescence intensities at the target 
locations using the DEARRAY software (16). After background subtraction, 
avenge intensities at each clone in the test hybridization were divided by the 
average hitensity of the correspoadii^ clone In tfie'contiol hybridization. For 
the copy number analysis, the ratios were normalized <m -die basis of the 
distribution of ratios of all targets on the array and for die expression analysis 
on the basis of 88 housekeeping ^enes, which were spotted four times onto the 
array. Low quality measiuements (i.e., copy number data witii mean refeieiice 
intensity <100 fluorescent units, and expression data with both lest , and 
reference intensity <I00 fluorescent units and/or witii ppot size <50 units) 



<r^, + a^ 

where m^i, ir^i and cr^ denote the means and SDs for die o^ressioti 
levels for amplified aiid nonamplifled cell lines, cespectiv^. To assess tiie 
Statistical significance of eadi wei jth^ we performed 10,000 random* pcmiu- 
tations of die label vector. The probability that a gene had a larger or equal 
weight by random permutation tiian tiie original was denoted by a. A 
low a (<0.05) indicates a strong assodation between gene expression and 
amplification. 

Genomic Localization of cPNA Qones and Amplicon Mapping. Each 
cDNA clone on the microarray was assigned to a Unigene cluster usmg the 
Unigcne Build 141.* A database of genomic sequence aligmnent information 
for mRNA sequences was created from the August 2:001 freeze of the Uni-- 
versity of California Santa Cniz's OoldenPadi database.^ The chromosome and 
bp' positions for each cDNA clone were dien retrieved by relating these data 
sets, Amplicons were defined as a CGH copy number ratio >2,0 in.at least two 
adjacent clones in two or more cell lines or a CGH ratio >2.0 in at least tiiree 
adjacent clones in a single cell line. The an^licon start and end positions were 



* Internet' address: fal4>^/reseflicfajihgrijiih.goiMc(08mor/downIoadabtejc^^ 
Internet, address: www.genonie.ucsc.edtt. ' 



6241 



OEKB .EXFRESSIQN PATTERNS IN fiftBAST CANCER 



Ta>Iel SimmarycffadependmampUamsfnUb^ 
CGH mieroarray 
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extended to include nei^^Kmng nonamplificd clones (ratio, <15). The un- 
.))HcoQ size detennination was partially dependent on local clone density. 

FISH. Dual-color interphase FISH to trcast tancer cell lines ^ done* as 
described (17). Bacterial artificial chromosome clone RPll-36iK8 was la-' 
beled wiHi Specbrumarange (Sym, Downei? Grove, IL), and Spednim^ 
Orange^^led piobe for EGFR was (Stained fioih Vysis. SpectrumOteen- 
labeled chromosome 7 and 17 centromere probes (Vysis) were used as a 
leference. A tissue mieroarray containing 612 formalin-fixed, paraffinnanbed- 
ded primary breast cancers (17) was applied in FISH analyses as described 
, (18). The use of these specimens was approved by the Ethics Committee of the 
University of Basel and by Ae NIH. Specimens containing a 2-fold or hi^cr 
increase m the number of test probe signds, as con^ared with cotresponcBng 
centromere, signals, in at least 10% of the tumor cells were cons!dca:ed to be 
amplified.. Survival analysis was performed using the Kaplan-Meier method 
and the log-rank lest 

BT-PCR. The HOXB7 expression level was .determined rehtive to 
GAPDH. Reverse transcription aiu! PCR amplification were per^>9ned using. 
AcfH5ss RT-PCai Sysfem ^Iomega C^wp., Madison, WI) with 10 ng of mRNA 
as a teo^late. HOJCS7 primers w«o 5'^AGCAOAGG(jACrc(jGACIT-3' 
and 5'-GCCTrCAdOTAGCOATroTAO-3'. 

RESJJLTS . ' 

Global Effect of Copy Number, on Gene Expression. 13,824 
arrayed cDNA clones were applied for ansilysis of gene expression 
and gene copy number (CGH microarrays) in 14 breast cancer cell 
lines. The results illiistrate a considerable influence of copy number 
on gene expression patterns. Up to 44% of tbe highly amplified 
transcripts (CGH mtio, >2J> were overexpressed (L^., belonged to 
the global upper 7% of expression ratiosX conqwred with only 6% for 
genes with normal copy numbw levels (Flg/lA). Conversely^ 10.5% 
of the transcripts with high-level egression (cDNA ratio, >10) 
showed increased copy number (Fig. IB), Low-level copy number 
increases and decreases were also associated, with similar, although 
less dramatic, oatcomes on gene e^qsression (Fig. 1). 

Identification of Distinct Breast Cancer Amplicons. Base-pair 
locations obtained for 1 1,994 cDNAs (86,8%) were used to plot copy 
number changes as a fiinction of genomic position (Fig. 2, Supple- 
ment Fig. A). The average spBcing of clones throughout (he genome 
was 267 kbi This tiigih-resoiution mapping identified 24 independent 
breast cancer amplicons,' spanning from 0.2 to-12 Mb of DNA (Table 
1). Several amplification sites detected previously by chromosomal 



CGH were validate wilii lq21, 17ql2-42L2i 17q22-q23, 20ql3.i, 
and 20ql3.2 regions being most commonly amplified. Furthermore' 
the boundaries of these amplicons were precisely delineated. In ad- 
dition, novel amplicons were identified 8t.9pl3 (38.65-39.25 Mb) 
and 17q213 (52.47-55,80 Mb), • 

Direct . Identification of Putative Amplificaition Target Genes* 
The cDNA/CGH mieroarray technique enables the direct correla- 
tion of copy number aind expression data on a gene-by-gene basis 
throughout the genome. We direcUy annotated high-resolution 
CGH plots with gene expression data using color coding. Fig. 2C 
shows that most of the amplified genes in the MCF-7 breast cancer 
cell line at lpl3, 17q22^q23, and 20ql3 were highly overex- 
pressed. A view of phromosome 7 in ^e MDA-468 cell line 
iiiiplicates EGFR as. the most highly overexpressed and amplified 
gene at 7pll-pl2 (Fig. 3>4). In BT-474. the t^o known amplicons 
at 17ql2,ahd 17q22-q23 contained numerous highly overex- 
pressed genes (Fig. 3if). In addition, several gcnfes,. including the 
homeobox genes H0XB2 and HdXB7, were highly amplified in a 
. previously undesoibed independent ampiicpn. at. n HOXB7 
was systematic^Jy amplified (as vididated by FISH, Fig. \B, inset) 
as well as overexpressed (as verified :by RT-PCR, data not shown) 
in BT-474, 0AC(C812, and ZR-75-:30 cells. Fiurthermore^ this iovel. 





Fig. 3. Annotation of gene expression data on CGH microanay profiles. A, genes in the 
7pll-pl2 aniplicon in the MDA-468 cell line are lugfaly expressed {red^) and include 
. the ECFR oncogene, ^. several genes in the 17ql2, 17q213^ and 17q23 mipUcons in the 
BT-474 breast cancer' cell line are highly overexpxessed {rtdj and inchide the HOXB7 
gene. The data labels and color ooding are as indicated for Fig. 2C tnsels show 
cfaronaosopial CGH profiles for the oonresponding cfaromosomes and validation of Ae 
Increased copy number by fatterphase FISH ttsing EGm (rvd) and duomosome 7 
centromere probe {green) to MDA-468. (^) and /f0tXB7-^>ecific'piobe (r«d) and chro- 
mosome 17 centromere (green) to BT-474 cells (Q. 
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level copy number mcicase, Low*level copy Dumber gains and losses 
also had a sigmficant mfluence on expression levels of genes in the 
legions affected, but these effects were more subtle on a gene-by-gene 
basis than those of high-level amplifications. However, the impact df 
low-level gains on the <fy$regulation of gene expression patterns in 
cancer may be equally important if not more in^Mnlant than that of 
high-level amplifications. Aneiq)loidy and low-level gains and losses 
of chromosomal arms represent flie most common ^^s of genetic 
alterations in breast and ofter cancers and, therefoife, have an iiifhi- 
Cttce on mai^ genes. Our results in breast cancer extend the recent 
studies cm iftie impact of aneuploidy on global gene expression pat- 
terns m yeast cells, acute myeloid leukeoiia, and a jwostate cancer 
model system (22-24)! 

The CGfl micrparray analyas identified 24 indep^dent breast 
cancer arnyi^licons. We defined the precise boundaries for. many am- 
pli<Sons detected previously by chromosomal CGH (9, 1 0, 25, 26) and 
also discovered novel amplicons that had not been detected previ- 
ously, presumably because of flieir small size (aify 1-2 Mb) or close 
proxiini^r to oto larger amplicons. One of these novel amplicons 
invohred die homeobox gene region at 17q213 and led to the over- 
e3q>ression of the HOp? sad H0XB2 genes. The homeodomam 
frstnscription factors known to be key regulators of embryonic 
development and have been occasionally reported to undeigo abeiiant 
expression in cancer (27, 28). H0XB7 transfection mduced cell pto- 
liferaltion in melanoma, breast, and ovarian cancer cells and increased 
tumorigenicit|r and angiogenesis in breast cancer (29-32), the pres- 
ent results in^)ly that gene anq)lificatio|i may be a prominent mech- 
a^sm for overexpressing tiOXB7 in breast cancer and suggest that 
contributes to tumor progressicm and confers an aggressive 
disease phenotype m breast cancer. This view is supported by our 
finding of an4)ltficatibn of H0XB7 in 10% of 363 primary breast 
cancers, as well as an association of anq>lification Wi A poor prognosis 
of the patiente. .. 

We carried out a systematic search to identify genes whose 
expression levels across all 14 cell lines were attributable to 
amplification status. Statistical analysis revealed 270 such genes 
(representing -^2% of all genes pn the array), including not only 
previously , described amplified genes, such as //£R-2, hffC, 
£GF^, ribosomal protein s6 kinase, and AJB3, but also numerous 
novel genes such as NRAS^related gene (lpl3), syndecan-2 (8q22), 
and hone morphagenic protein (20ql3.1), whose activation by 
ampliflcatiim n^ay sinularly promote breast cancer progression. 
Most of the 270 gend have not been in^Iicated previously in 
breast cancer developmerit and suggest novel pathogenetic mech- 
anisms. Although we would not expect all of them to be causally 
involved, it is intriguing that 84% of the genes with associated 
functional infonnatidn were implicated in ajpoptosis, cell prolifer- 
ation, signal transduction, n-anscriptipn, or other cellular processes 
that could directly imply a possible role in cancer progression. 
Therefore, a detailed characterization of these genes may provide 
biological insights to breast cancer progression and might lead to 
the development of novel therapeutic strategies. 

In summary, we. demonstrate application of cDNA microarrays 
to the analysis of both copy number and expression levels of over 
12,000 transcripts throughout the breast cancer genome, roughly 
once every 267 kb. This analysis provided: (a) evidence of a 
prominent global influence of copy number changes on gene 
expressiori levels; (b) a high-resolution map/of 24. independent 
aniplicons in breast cancer, and (c) identification of a set of 270 
genes, the ovefexpression of which was statistically attributable to 
gene amplification. Characterization of a novel amplicon at 
17q21.3 implicated amplification and oyerexpression . of the 
H0XB7 gene in breast cancer, includiiig a clinical association 
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between H0XB7 amplification and poor patient prognosis. Overall 
our results illustrate how the identificalion of genes activated by 
gene anaplification provides a powerful approach to highlight 
genes with an important role in cancer as.weQ as to prioritize and 
validate putative targets for therapy development 
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